Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27754][K8S] Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s #24630

Closed
wants to merge 3 commits into from

Conversation

arunmahadevan
Copy link
Contributor

@arunmahadevan arunmahadevan commented May 16, 2019

What changes were proposed in this pull request?

Spark on k8s supports config for specifying the executor cpu requests
(spark.kubernetes.executor.request.cores) but a similar config is missing
for the driver. Instead, currently spark.driver.cores value is used for integer value.

Although pod spec can have cpu for the fine-grained control like the following, this PR proposes additional configuration spark.kubernetes.driver.request.cores for driver request cores.

resources:
  requests:
    memory: "64Mi"
    cpu: "250m"

How was this patch tested?

Unit tests

Spark on k8s supports config for specifying the executor cpu requests
(spark.kubernetes.executor.request.cores) but a similar config is missing
for the driver. Apparently `spark.driver.cores` works but its not evident that this accepts
fractional values (its defined as an Integer config but apparently
accepts decimals). To keep in sync
with the executor config a similar driver config can be
introduced (spark.kubernetes.driver.request.cores) for explicitly specifying
the driver CPU requests. If not provided, the value will default to `spark.driver.cores` as before.
@SparkQA
Copy link

SparkQA commented May 16, 2019

Test build #105472 has finished for PR 24630 at commit 4002b74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixcheung
Copy link
Member

@mccheah ?

conf.get(KUBERNETES_DRIVER_REQUEST_CORES).get
} else {
driverCpuCores
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @arunmahadevan . Could you rewrite like the following one-liner?

-  private val driverCoresRequest = if (conf.contains(KUBERNETES_DRIVER_REQUEST_CORES)) {
-    conf.get(KUBERNETES_DRIVER_REQUEST_CORES).get
-  } else {
-    driverCpuCores
-  }
+  private val driverCoresRequest = conf.get(KUBERNETES_DRIVER_REQUEST_CORES.key, driverCpuCores)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I just followed the pattern used somewhere else in the code.

.configurePod(basePod)
.container.getResources
.getRequests.asScala
assert(requests1("cpu").getAmount === "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this assumption? You had better get the default value of DRIVER_CORES and compare with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, as per the existing logic the default value would be "1" if spark.driver.cores is not set and not the default value of spark.driver.cores which also happens to be 1. I did not want to change that logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arunmahadevan . What I meant was this constant assertion, === "1". If we change the default value of that configuration, this test will fails. We had better get and use the default value from the conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun , the logic in BasicDriverFeatureStep
private val driverCpuCores = conf.get(DRIVER_CORES.key, "1")
, sets the value of driverCpuCores to "1" if spark.driver.cores is not set in the spark conf. i.e. we don't use the DRIVER_CORES.defaultValue there. Let me know if my understanding is correct.
If so I cannot do something like assert(requests1("cpu").getAmount === DRIVER_CORES.defaultValue) here (assume this is what you are suggesting).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be the following because we should not have a magic number in the code.

-  private val driverCpuCores = conf.get(DRIVER_CORES.key, "1")
+  private val driverCpuCores = conf.get(DRIVER_CORES)
     val driverCpuQuantity = new QuantityBuilder(false)
-      .withAmount(driverCpuCores)
+      .withAmount(driverCpuCores.toString)

Please fix like the above. And don't use magic numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

.configurePod(basePod)
.container.getResources
.getRequests.asScala
assert(requests3("cpu").getAmount === "100m")
Copy link
Member

@dongjoon-hyun dongjoon-hyun May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, could you avoid repetitions like the following?

Seq("0.1", "100m").foreach { value =>
  sparkConf.set(KUBERNETES_DRIVER_REQUEST_CORES, value)
  val requests = new BasicDriverFeatureStep(KubernetesTestConf.createDriverConf(sparkConf))
    .configurePod(basePod)
    .container.getResources
    .getRequests.asScala
    assert(requests("cpu").getAmount === value)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@dongjoon-hyun
Copy link
Member

BTW, @arunmahadevan . The PR title, Introduce config for driver request cores, seems to claim too much. We already have a configuration, spark.driver.cores. And, it's intentionally designed like that. This PR should focus on the new benefits.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't know the history of the decision, I'll take a look again after revision. Thanks.

Copy link
Contributor

@onursatici onursatici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth noting that pod templates supports overriding spark container cpu requests: https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#pod-template

If you pass a pod spec template with a single container in it with resource requests set, spark will use that as a base in building pod resources.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 17, 2019

Ah, right. As @onursatici pointed out, pod spec can have cpu for the fine-grained control. (https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory).

   resources:
      requests:
        memory: "64Mi"
        cpu: "250m"

Since there is a more general way, the benefit of this PR is very limited. Please update the PR description, @arunmahadevan .

Or, could you enhance K8s document about this instead of adding this 3rd way of cpu spec? In fact, I prefer this document updates.

@arunmahadevan arunmahadevan changed the title [SPARK-27754][K8S] Introduce config for driver request cores [SPARK-27754][K8S] Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s May 17, 2019
@arunmahadevan
Copy link
Contributor Author

Updated title to Introduce additional config (spark.kubernetes.driver.request.cores) for driver request cores for spark on k8s

@arunmahadevan
Copy link
Contributor Author

Or, could you enhance K8s document about this instead of adding this 3rd way of cpu spec? In fact, I prefer this document updates.

I agree we have an alternate way of specifying the driver cpu request cores for spark on k8s via the pod template. However we already have spark.kubernetes.executor.request.cores for executor but similar is missing for driver and it leads to confusion. Also users may just want to directly specify via the configs rather than providing pod template. So the point is then should we consider removing all the "spark.kubernetes" configs that can be specified via pod templates?

@SparkQA
Copy link

SparkQA commented May 17, 2019

Test build #105500 has finished for PR 24630 at commit 8ca63d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

I didn't suggest removing, @arunmahadevan . That kind of PR will be considered negatively too in the same way. You know, we usually are conservative for both sides (adding and removing).

So the point is then should we consider removing all the "spark.kubernetes" configs that can be specified via pod templates?

@arunmahadevan
Copy link
Contributor Author

I didn't suggest removing, @arunmahadevan . That kind of PR will be considered negatively too in the same way. You know, we usually are conservative for both sides (adding and removing).

In this case I think we will get more consistent by introducing a config specific to k8s driver for which the executor config already exists.

@SparkQA
Copy link

SparkQA commented May 18, 2019

Test build #105509 has finished for PR 24630 at commit ff688c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updates, @arunmahadevan . I updated the PR description with #24630 (comment).

I'll wait for the other reviewers' comments.

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @arunmahadevan and @felixcheung .

@arunmahadevan
Copy link
Contributor Author

Thank you @dongjoon-hyun for taking time to review and merge.

@marxangels
Copy link

Very inflexible,I'm looking for some config option to do such

resources:
  requests:
    memory: "911Mi"

maybe --conf spark.kubernetes.executor.request.memory=150Mi

Unfortunately, it doesn't work.

@marxangels
Copy link

This should be done through mapping, not one by one.

@marxangels
Copy link

The code is not elegant enough to support "unknown" k8s configuration items (why not use rule mapping instead of enumerating it one by one?).

Alexis-D pushed a commit to Alexis-D/spark that referenced this pull request Nov 16, 2020
…er.request.cores) for driver request cores for spark on k8s

Spark on k8s supports config for specifying the executor cpu requests
(spark.kubernetes.executor.request.cores) but a similar config is missing
for the driver. Instead, currently `spark.driver.cores` value is used for integer value.

Although `pod spec` can have `cpu` for the fine-grained control like the following, this PR proposes additional configuration `spark.kubernetes.driver.request.cores` for driver request cores.
```
resources:
  requests:
    memory: "64Mi"
    cpu: "250m"
```

Unit tests

Closes apache#24630 from arunmahadevan/SPARK-27754.

Authored-by: Arun Mahadevan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rshkv pushed a commit to palantir/spark that referenced this pull request Nov 17, 2020
…er.request.cores) for driver request cores for spark on k8s

Spark on k8s supports config for specifying the executor cpu requests
(spark.kubernetes.executor.request.cores) but a similar config is missing
for the driver. Instead, currently `spark.driver.cores` value is used for integer value.

Although `pod spec` can have `cpu` for the fine-grained control like the following, this PR proposes additional configuration `spark.kubernetes.driver.request.cores` for driver request cores.
```
resources:
  requests:
    memory: "64Mi"
    cpu: "250m"
```

Unit tests

Closes apache#24630 from arunmahadevan/SPARK-27754.

Authored-by: Arun Mahadevan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants